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This study provided preliminary results about the 
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multidimensionality with data simulated from both compensatory and 
noncompensatory models under a latent structure where all items in a test 
were influenced by the same two abilities. For the first case, data were 
simulated to reflect real test data in terms of descriptive statistics and 
classical item characteristics. In this case, DIMTEST did identify some 
degree of departure from essential unidimensionality for data sets from the 
noncompensatory model when the sample size was large and the interability 
correlation was low. For data simulated from the compensatory model, DIMTEST 
results suggested acceptance of the hypothesis of essential dimensionality. 

In the other three cases, data were simulated under various conditibhs in 
which the relative influence of the second dimension was greater than in Case 
1. For these cases, when DIMTEST identified multidimensionality, the power 
increased when test length and sample size increased, and when interability 
correlation decreased. A question that remains unanswered is whether there 
are monotonic relationships between the power of DIMTEST and the degree of 
relative magnitude or variability for the two discrimination vectors. 
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Compensatory and Noncompensatory Multidimensionality 1 

Detecting Compensatory and Noncompensatory Multidimensionality Using 

DIMTEST 

A fundamental assumption of most commonly used item response theory (IRT) models is 
that the test in question measures a single ability. This assumption must be evaluated before any 
application of unidimensional IRT models because violating this assumption could seriously bias 
item and ability parameter estimation (Ansley & Forsyth, 1985; Way, Ansley, & Forsyth, 1988). 
Therefore, it is crutial to verify this assumption prior to the use of unidimensional IRT models. 

Among a variety of methods proposed to assess unidimensionality, DIMTEST (Stout, 

1987, 1990) is a relatively new yet very promising statistical procedure that has attracted 
considerable attention in recent years. It was first developed by Stout (1987) and was further 
improved by Nandakumar and Stout (1993). This procedure is based on the conceptualization of 
essential dimensionality, which proposes to count only the dominant dimensions with min or 
dimensions ignored. This conceptualization depends on the replacement of local independence by 
the weaker notion of essential independence, and provides justification for the use of 
unidimensional IRT models subsequent to a statistical verification that essential unidimensionality 
holds for a set of item responses. 

In order to apply DIMTEST, the N items of a test are split into three subtests: assessment 
subtests AT land AT2, and partioning subtest PT. ATI contains M items that can be selected either 
through factor analysis or expert opinion. These items presumably measure the same dominant 
ability and are dimensionally distinct from the rest of items. Another subset of M items are selected 
so that they have same item difficulty distribution as ATI items. These items form AT2 and are 
used to offset the statistical bias in ATI items arising from short test length and/or extreme 
difficulty levels. The remaining (N-2M) items form PT which is used to partition examinees into 
subgroups based on their total score on these items. The DIMTEST statistic T is computed for ATI 
and AT2 subtests based on the within subgroup differences between the usual variance estimate 
and the unidimensional variance estimate. This statistic has been proven to be asymptotically 
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normally distributed with mean zero and variance one when essential unidiemnsionality holds 
(Stout, 1987). 

DIMTEST has many advantages such as its nonparametric nature, asymptotic theory basis, 
and computational efficiency. Its performance has been evaluated by many studies based on 
simulated and real data. Generally the results indicate that DIMTEST is able to correctly confirm 
unidimensionality for unidimensional datasets and effectively detect multidimensionality for two- 
or three-dimensional datasets (Nandakumar, 1993; Nandakumar & Stout, 1993; Stout, 1987). The 
accuracy of DIMTEST has been found to depend on both sample size and test length, with T 
performing best on tests with more than 25 items and with sample sizes greater than 500 (de 
Champlain & Gessaroli, 1991). Nandakumar & Stout (1993) also found that DIMTEST 
performed poorly when a test contained highly discriminating items with guessing present. 
Therefore they revised this procedure to overcome this limitation and automate the determination of 
M, the size of the assessment subtests. The improved DIMTEST, with statistic T', has been shown 
in simulation studies to adhere more closely to the nominal level of significance for unidimensional 
tests and achieve greater power for multidimensional tests (Nandakumar & Stout, 1993). 

When simulated data are used to assess the sensitivity of DIMTEST to multidimensional 
data, one needs to choose a multidimenisonal IRT model as a basis for data generation. Does the 
choice of model impact the performance of DIMTEST? In other words, does DIMTEST 
distinguish multiple dimensions equally well with data generated from different multidimensional 
models? This is the major question that has driven the present study. 

Among the multidimensional models that have been proposed, a major difference rests on 
whether compensation occurs among the abilities required to answer the items correctly. Both 
compensatory and noncompensatory models have been proposed. Sympson (1978) proposed a 
multidimensional extension of the unidimensional three-parameter logistic model that can be 
classified as noncompensatory (or partially compensatory). This model can be represented as 
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^•( 6 , 7 .) = 

na + exp[-1.7a,,(9,„-(>^.)]) 



0 ) 



wher 6,^ is the ability parameter for person i for dimension h, is the discrimination parameter 
for itemy for dimension h, bj^ is the difficulty parameter for item j for dimension h, and Cj is the 
guessing parameter for item j. 

A compensatory multidimensional extension of the three-parameter logistic model was 
represented by Doody-Bogan and Yen (1983) as 



where all parameters are defined as in equation (1). 

The distinction between the two models can be intuitively seen by comparing the 
denominators of Equation (1) and (2). In the noncompensatory model, the denominator is the 
product of denominators for each dimension, while in the compensatory model the effects of each 
dimension are combined within the exponential term. Therefore the compensatory model permits 
high ability on one dimension to compensate for low ability on another dimension in terms of 
probabihty of correct response; whereas in the noncompensatory model high ability on one 
dimension cannot offset low ability on another dimension outside of a limited range, since the 
maximum probabihty of correct response based on one dimension is the upper bound for the 
probability based on the two dimensions. 

In addition to model selection, another issue concerning simulating multidimensional data is 
the specification of a latent structure underlying test items. Based on a simple structure pattern, 
items of a test can be partitioned into clusters that are each influenced by a single ability. With a 
less-clear-cut latent structure, a test may contain some "mixed" items that are influenced by more 
than one dimension. In a more extreme situation, each item in a test can be influenced by the same 
multiple dimensions. 
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Compensatory and Noncompensatory Multidimensionality 4 

As outlined above, specifing model and latent structure are the two major issues to be 
considered when simulating multidimensional data. Data can be simulated in different ways 
depending on how choices are made on these two issues. A review of literature suggests that 
simulation studies for DIMTEST have almost uniformly used a compensatory model coupled with 
a simple structure to generate data. That the compensatory model has always been the choice is 
largely due to the fact that there is no estimation procedure currently available for the 
noncompensatory model; therefore, no parameter estimates from real data can be used for data 
simulation. This presents the question of how to make the simulated data realistic if a 
noncompensatory model is to be used for data generation. It has been argued that the 
noncompensatory view of dimensionality is more reasonable when a multidimensional test is 
considered to be one that requires the simultaneous application of two or more abilities (Ansley & 
Forsyth, 1985; Sympson, 1978). If this is the case, how well DIMTEST can distinguish multiple 
dimensions when data arise from a noncompensatory model needs to be studied, and the results 
compared to those from compensatory cases. For this kind of study, however, it is very important 
to ensure that the simulated data represent real test data as well as possible. 

With respect to the specification of latent structure, a general approach shared by the 
previous studies is that, a test was taken to consist of a subset of items dependent on 0, alone, 
another subset of items dependent on 02 alone, and sometimes, a third subset of items dependent 
on both 0, and 02 However, such a simple structure type of pattern may not represent what one 
might typically encounter in real testing situations. With real data it is sometimes the case that all of 
items in a test are simultaneously influenced by the same multiple abilities. For example, a general 
reading ability may influence all of items in a math-problem solving test. As another example, a test 
anxiety factor may also influence all the items in a test instead of just a few. It seems necessary to 
simulate data to reflect this type of situation when investigating how well DIMTEST can detect 
multidimensionality. 

In recognition of the importance of this type of latent structure, Nandakumar (1991) 
conducted a simulation study which included two cases of multidimensional structures. In one case 
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several minor abilities existed with each influencing only a small group of items, while in the other 
case one minor ability existed which influenced all items in the test. In both cases a compensatory 
model was used for data simulation. The mean and standard deviation of item discrimination 
parameters for the minor ability were considered to reflect the influence of the minor ability relative 
to the major ability. A rough index P was further proposed to assess the deviation from essential 
unidimensionahty due to the joint variation of a, and with the index defined as the minimum of 
the a, and variances multiphed by a constant. According to the results from that study, 
DIMTEST tended to retain the hypothesis of essential unidimensionality when the min or 
dimension(s) had a relatively small influence on item scores, and was more likely to reject the 
hypothesis when the influence of minor dimension(s) increased. The rejection rates were also 
shown to vary roughly according to p, with higher rejection rates associated with higher values of 
P. Among simulation studies for DIMTEST, this one is of special interest since it for the first time 
simulated data based on a new type of latent structure. However, this study only simulated cases 
with uncorrelated abilities and uncorrelated item parameters, which hmits the extent to which the 
results can be generated to other conditions. 

Another study by Hattie et. al is also noteworthy since it was the initial attempt to examine 
the performance of DIMTEST using data simulated from a noncompensatory model (Hattie, 
Krakowski, Rogers, & Swaminathan, 1996). In that study, data were generated using a program 
called DIMENSION based on a simple structure pattern. The effectiveness of DIMTEST for 
identifying compensatory and noncompensatory multidimensionality was examined along with 
some other issues. As a result, DIMTEST was found to be sensitive to whether the 
multidimensional data arose from a compensatory or a noncompensatory model. Specifically, for 
data from a compensatory model, the null hypothesis of essential unidimensionality was 
appropriately rejected most of the time, whereas for data from a noncompensatory model, the null 
hypothesis was rejected far less than expected under most conditions. Their conclusion was that 
DIMTEST is only applicable for identifying compensatory multidimensional data. However, this 
study has a major limitation in that it did not address the issue of reahsm for any of the simulated 
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data sets, and the way the data were simulated may have been problematic, especially in the 
noncompensatory case. The concern is centered at the two difficulty ranges, [-2 , -1, 0, 1, 2] and 
[-1, -.5, 0, .5, 1], that were used for data simulation. As has been pointed out (Ansley & Forsyth, 
1985), in the data generation procedure using a noncompensatory model, the difficulty (Jb) values 
play a major role in determining the realism of the data sets. It has been shown that data sets 
simulated from a noncompensatory model with b vectors centered at zero resulted in test data 
indicative of an uncharacteristically difficult test. Therefore the b values need to be scaled to have 
lower means to avoid such a problem. 

None of the studies in the literature has examined the performance of DIMTEST for data 
simulated based on a noncompensatory model with a latent struture in which all items load on the 
same dimensions. This represents a situation where all the items of a test are influenced by the 
same abilities and aU abilities are required simuataneously to answer each item correctly. Referring 
back to the example of math-problem solving test, the reading ability is considered to influence all 
the items in the test; in addition, the two abilities may not be compensatory. For an examinee very 
low on the major abUity (math-problem solving), no degree of competence on the min or ability 
(reading) may be able to compensate for this deficiency and lead to high probability of correct 
response. This situation may be of much practical relevance and should be considered in simulation 
studies. 

The purpose of this study, therefore, was to examine the power of DIMTEST for detecting 
multidimensionality with data simulated using both compensatory and noncompensatory models 
based on a latent structure in which each item is simultaneously influenced by the same two 
abilities. Different simulation cases were considered which varied in terms of relative potency of 
the second dimension, in order to gain knowledge of certain performance characteristics of 
DIMTEST under these conditions. The performance of DIMTEST for identifying 
multidimensionality was assessed and compared across the two models. The effects of test length, 
sample size, interability correlation, and guessing on the power of DIMTEST were also examined. 
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Method 

Monte Carlo simulations were conducted for each simulation case which varied in terms of 
the distribution of item discrimination parameters. For the noncompensatory model, the item 
difficulty parameters determine, to a large extent, the realism of the generated datasets, therefore 
the item difficulty parameters were not altered across simulation cases. The relative dominance of 
the second dimension was manipulated by means of changing the distribution characteristics 
(mean, SD) of the item discrimination parameters. 

Simulation Case 1 

The distribution characteristics for the a and b vectors were adopted from the Way et. al 
(1988) study. Specifically, for the noncompensatory model, the a^ values had a mean of 1.23 and 
a SD of 0.34, while the values were centered at 0.49 with a SD of 0.1 1. The two a vectors had 
a correlation of -0.29. The values had a mean of -0.33 and a SD of 0.82, while the mean and 
SD for values were -1.03 and .82. The correlation between the b vectors was .38. The c value 
was set at .2 for all items. The item parameters for the compensatory model were obtained by 
adding the following constants to the corresponding item parameters for the noncompensatory 
model: -.20 to each n, value, .63 to each value, and 1.0 to each b^ value. The values and c 
values were unchanged. The rationale for the selected parameter distributions can be found in the 
two previous studies (Ansley & Forsyth, 1985; Way, Ansley, & Forsyth, 1988). These sets of 
item parameters have also been shown in these studies to yield item responses that closely 
resembled actual test data in terms of descriptive statistics, reliability, and difficulty indices; and the 
number-correct score distributions resulting from the two models were reasonably s imil ar 

To generate binary item responses, the and values and the and b^ values were each 
generated from a bivariate normal distribution with the corresponding means and SDs specified 
above. Examinee abilities were generated from a bivariate normal distribution with both means zero 
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and both variances one and with a certain level of interability correlation (.3 or .7). For each 
simulated examinee, the probabihty of correctly answering each item was computed using either of 
the two models with the corresponding item parameters and the generated abihty for the examinee. 
If a uniform random deviate in the interval (0, 1) was less than or equal to the computed 
probabihty, the examinee was considered to have answered the item correctly and was given a 
score of 1; otherwise a score of 0 was given. 

The design of the study used three sample sizes (500, 1000, 2000). Each dataset was 
partitioned into two groups. One group (of size 200, 300, 500) was used for factor analysis to 
select the DIMTEST ATI items, and the other group (of size 300, 700, 1500) was used to compute 
the statistic T. 

In addition, three test lengths (20, 40, 50), two levels of interability correlation (.3, .7), 
and two choices of model (compensatory, noncompensatory) were used. All factors were 
completely crossed, resulting in a total of 36 combinations. Each combination was rephcated 100 
times for a total of 3600 datasets, with new examinee responses being simulated each time. 
DIMTEST was applied to each dataset. For all DIMTEST runs, the default method of factor 
analysis was used for selecting ATI items, and the Wilcoxon rank sum test (with a nominal level 
of .05) was called for a difficulty check for the selected ATI items. The number of rejections over 
100 rephcations was noted. 

Simulation Case 2 

In this case the mean of values was increased from .49 to l.i23. Thus the distributions of 
(3, and values had the same mean of 1.23 and different SDs with SD^,=0.34 and SD32=0.1 1. 

The purpose for doing this was to explore the sensitivity of DIMTEST to the increased relative 
potency of the second dimension in terms of the magnitude of item discrimination parameters. Data 
were simulated with two levels of test length (20, 40), three levels of sample size (500, 1000, 
2000), three levels of interability correlation (0, .3, .7), two levels of guessing (0, .2), and two 

choices of model (compensatory, noncompensatory). The addition of c=0 and p=0 cases was 
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intended to examine the behavior of DIMTEST under more extreme circumstances. Each of the 72 
combinations of factors was replicated 100 times, resulting in 7200 datasets, with new examinee 
responses simulated each time. The procedures for generating item responses and for DIMTEST 
runs were the same as in Case 1. 

Simulation Case 3 

In this case the means of a, and ^2 values remained the same as in case 1, while the SD of 
was increased from .1 1 to .34; thus, the distributions of the a, and values had the same SD 
of 0.34 and different means with mean^,=1.23 and mean^2=0.49. This was intended to examine the 
performance of DIMTEST in the instances with increased relative potency of the second dimension 
in terms of variability of item discrimination parameters. Comparisons could also be made for 
rejection rates across the three simulation cases which reflected different distributions of item 
discrimination parameters. As in Case 2, data were simulated with two levels of test length (20, 

40), three levels of sample size (500, 1000, 2000), three levels of interability correlation (0, .3, 

.7), two levels of guessing (0, .2), and two choices of model (compensatory, noncompensatory). 
The procedure for generating item responses and for DIMTEST runs remained the same as in 
previous cases. 

Simulation Case 4 

In this case both the a, and values had a mean of 1.23 and a SD of 0.34. Interability 
correlations and guessing were aU set to zero. An additional level of correlation between the a, and 
values ( r^j^ 2=0 ) was included to identify the effect of correlation of a vectors on the power of 
DIMTEST. The purpose of this simulation case was to explore the power of DIMTEST in a 
condition where multidimensionality might be most extreme, with no guessing and zero correlation 
between dimensions, and with the two dimensions equally potent in terms of both magnitude and 
variability of item discrimination parameters. Data were simulated with two levels of test length 
(20, 40), three levels of sample size (500, 1000, 2000), two levels of correlation between a 
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vectors (0, -0.29), and two choices of model (compensatory, noncompensatory). There were 24 
combinations of factors, with each combination replicated 100 times. Again the procedure for item 
response generation and DIMTEST runs remained the same as outlined in Case 1. 

Results 

The results are presented separately for each simulation case. Within each case, rejection 
rates are tabulated separately for the two models. The rows and columns are arranged such that the 
pattern of numbers is easily captured, thus facilitating the interpretations. 

Simulation Case 1 

The rejection rates over 100 trials for simulated datasets with varying degrees of test length 
(N), sample size (S), and interability correlation (p) are presented in Table 1 and Table 2 for the 

noncompensatory model and the compensatory model, respectively. The results are shown for the 
three significance levels (.01, .05, and .010) with T or T' used. In these tables, each cell 
(consisting of 3 rows and 6 columns) refers to three datasets with the same level of test length and 
interability correlation. Within each cell comparisons are made possible for rejection rates across T 

and T', across a levels, and across sample sizes. The corresponding rows of different cells allow 
comparison of the effects of test length and interabiUty correlation. 

Noncompensatory Model 

It can be seen in Table 1 that, as expected, the rejection rates for T' were always higher or 

at least equal to those for T, and a more liberal a level resulted in higher rejection rates. For 
datasets with high interability correlation (.7), the rejection rates were all very low. The highest 

values at the a levels of .01, .05 and .10 were 2%, 4%, and 9%, respectively, using T, and 4% , 

7% , and 14%, respectively, using T'. Therefore, the datasets with high interability correlations 
would all seem to be classified as essentially unidimensional in this case. 
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Insert Table 1 about here 



For datasets with low interability correlations (.3), the results were quite different. For 
datasets with small (500) and moderate (1000) sample sizes, the rejection rates were all close to 
nominal levels when T was used and a Uttle higher than nominal levels when T' was used, but 
even the large values generally did not differ much from the nominal levels. While for datasets with 
large (2000) sample sizes, the rejection rates were considerably higher than the nominal levels. 

This was especially true with N=50 and S=2000, where the rejection rates at the significance levels 
of .01, .05, and .10 were 7%, 17%, and 28%, respectively, using T, and were 12%, 23%, and 
33%, respectively, using T'. This would be indicative of some degree of departure from essential 
unidimensionality. It was also observed that, in some cases, the rejection rates for sample sizes of 
500 were higher than those for sample sizes of 1000, which might be due to s amp ling error. 
DIMTEST is based on large sample theory; therefore, the results from small samples may be 
unstable and inaccurate. 

It can be concluded that DIMTEST detected some degree of multidimensionality for 
datasets simulated from the noncompensatory model in Case 1 when the sample size was large and 
interability correlation was low. 

Compensatory Model 

From Table 2, it is very clear that the rejections rates were all very low. Across all levels of 
test length, sample size and interabiUty correlation, the highest rejection rates at the significance 
levels of .01, .05, .10 were 1%, 5%, and 7%, respectively, using T, and 3%, 8%, and 13%, 
respectively, using T'. Obviously for all of the datasets simulated from the compensatory model in 
this Case, the DIMTEST results would imply acceptance of the null hypothesis of essential 
unidimensionality. 
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Insert Table 2 about here 



Simulation Case 2 

For datasets simulated in this case, the rates of correctly rejecting the assumption of 
essential unidimensionality are presented in Table 3 and Table 4 for the noncompensatory model 
and compensatory model, respectively. 

Noncompensatory Model 

Table 3 shows that DIMTEST rejected the hypothesis of essential unidimensionality for 
datasets simulated from the noncompensatory model under various conditions, and the power was 
dependent on test length, sample size, interability correlation, and the presence /absence of 
guessing. 



Insert Table 3 about here 



The effect of guessing was clearly seen by contrasting the top portion with the bottom 
portion of Table 3. In general the power decreased when guessing was present. However, under 
the conditions where both the test length and the sample size were large (N=40, S=2000) and the 

interability correlation was zero (p=0), the power was not greatly affected by guessing. The power 

increased when test length and sample size increased, and decreased when interability correlation 
increased, which was in agreement with the results from previous studies based on data simulated 
from simple structure. For datasets with long tests (N=40) and moderate or large sample sizes 
(S=1000 or 2000), DIMTEST maintained good power when interability correlation increased from 
0 to .3. For all the datasets with test length of 40, sample size of 1000 or 2000, zero guessing, and 
interability correlation of 0 or .3, the power was extremely high, ranging from 94% to 100% even 

when a stringent a level of .01 was used. Thus DIMTEST was very powerful for detecting 
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multidimensionality for datasets simulated from the noncompensatory model when test length and 
sample size were large, interability correlation was low, and no guessing was present. 

It was also observed that for various combinations of the following factors, nonzero 
guessing, short test length, small sample size, and high interability correlation, the rejection rates 
of DIMTEST dropped to nominal levels. DIMTEST may lack power under these conditions. 
Compensatory Model 

As shown in Table 4, the rejection rates for the compensatory model were all very low. 
Across all factors, the highest rejection rates at the a levels of .01, .05, and .10 were 1%, 4%, and 

10%, respectively, using T, and were 4%, 10%, and 15%, respectively, using T'. It seems that 
DIMTEST retained the hypothesis of essential unidimensionality for all the datasets generated from 
the compensatory model in this case. 

Insert Table 4 about here 



Simulation Case 3 

For datasets simulated in this case, the results for the noncompensatory model and the 
compensatory model are shown in Table 5 and Table 6, respectively. 

Noncompensatory Model 

It can be seen from Table 5 that the patterns of effects of test length, sample size, guessing, 
and interability correlation were similar to those observed in Table 3, but the power was generally 
lower. 



Insert Table 5 about here 



With c=.2, the datasets were identified as unidimensional by DIMTEST except when 
N=40 and p=0, where some degree of multidimensionality was detected. With c=0, DIMTEST 
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identified some degree of multidimensionality for all the datasets with N=40 and p=0, and the 
rejection rates were generally higher than the corresponding cases with c=.2. Multidimensionality 
was also identified for some additional cases with N=20, p=0, and S=1000 or 2000, or with 
N=40, p=.3, and S=1000 or 2000. 

In general DEMTEST did not show great power for detecting noncompensatory 
multidimensionality for data generated in this condition. Only in the case where both the test length 
and sample size were large (N=40, S=2000), the interability correlation was zero (p=0), and no 
guessing was present (c=0), was the power acceptable, with rejection rates at a levels of .01, .05, 
and .10 being 58%, 74%, and 81%, respectively, using T, and 66%, 78%, and 82%, respectively, 
using T'. 

Compensatory Model 

Rejection rates for datasets generated from the compensatory model are shown in Table 6. 
The power was considerably higher than that for noncompensatory cases under the same 
conditions. For example, for datasets with N=40, S=1000, and p=0, the rejection rates using T’ at 
the a levels of .01, .05, and .10 were 28%, 54%, and 63%, respectively, for the 
noncompensatory model, and were 75%, 93%, and 95%, respectively, for the compensatory 
model. Again the same pattern of effects of test length, sample size, interability correlation, and 
guessing was observed as described before. When guessing was not present, DIMTEST 
maintained good power for long tests (N=40) and large sample sizes (S=2000) when the 
interability correlation increased from 0 to .3, which was also observed for the noncompensatory 
data in Case 2. 



Insert Table 6 about here 



It can be seen that datasets with p=0.7 were uniformly classified as unidimensional, and 
the same was true for datasets with nonzero guessing and interability correlation of p=.3. For 
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datasets with long test lengths (N=40), large sample sizes (S=2000), zero guessing (c=0), and 
zero to low interability correlations (p=0 or p=.3), DIMTEST demonstrated good power for 
detecting compensatory multidimensionality. 

Simulation Case 4 

For data simulated in this case, the rejection rates are presented in Table 7 and Table 8 for 
the noncompensatory model and the compensatory model, respectively. 

Noncompensatory Model 

From Table 7, it can be seen that the rejection rates for the noncompensatory model were 
comparable to those observed in Case 2 (see Table 3) under the same conditions of test length, 
sample size, interability correlation, and guessing, and were higher than those observed in Case 3 
(see Table 5) under the same conditions. Also, the rejection rates were similar across the two levels 
of Thus the magnitudes of the discrimination parameters appear to dictate the degree of 
multidimensionality for datasets simulated form the noncompensatory model, and the performance 
of DIMTEST was not influenced by the correlation of the and values. 



Insert Table 7 and Table 8 about here 



Compensatory Model 

As shown in Table 8, the rejection rates for compensatory model varied significantly across 
the two levels of r^i^, with the rejection rates being higher when r^i^^ than when r^i^=-.29. This 
suggests the impact of a^ and Uj correlation on the power of DIMTEST for detecting compensatory 
multidimensionality. Contrasting the portion of Table 8 with r^j^ =-.29 with the left bottom portion 
of Table 6 reveals that, under these condition, the power was higher when mean^^j ^ mean^^j than 
when mean^, = mean^j. Recall that for all the datasets with mean^j ^ mean^j and SD^j^t SD^, and 
for all the datasets with mean^j = mean^j but SD^j?t SD^, DIMTEST appeared to have limited 
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power for detecting compensatory multidimensionality. This result imples that there is an 
interaction effect for power between the magnitude and the variability of item discrimination 
parameters. 



Discussion 

This study provided preliminary results for the performance of DIMTEST for detecting 
multidimensionality with data simulated from both compensatory and noncompensatory models 
under a latent structure that all items in a test were influenced by the same two abilities. The 
datasets simulated in case 1 were intended to reflect real test data in terms of descriptive statistics 
and classical item characteristics. The Way et. al study (1988) found that for data simulated as in 
Case 1, the unidimensional IRT procedures yielded biased estimates of item and ability parameters. 
Therefore it was of interest to know what DIMTEST would conclude about essential 
dimensionality for data simulated this way. As the results showed, DIMTEST did identify some 
degree of departure from essential unidimensionality for datasets simulated from the 
noncompensatory model when sample size was large and interability correlation was low, although 
the rejection rates were not very high. On the other hand, for all of the data simulated from the 
compensatory model, DIMTEST results suggested acceptance of the hypothesis of essential 
unidimensionality. In general, datasets simulated in this case can be characterized as having one 
dominant dimension and one minor dimension; therefore, it is not suprising that rejection rates 
were low in most cases. 

In Case 2 through Case 4, data were simulated under various conditions where the relative 
influence of the second dimension was greater than in Case 1 . It should be noted that data 
simulated in these cases may not be very realistic, and the score distributions resulting from the 
two models may not be comparable. From a theoretical perspective, however, these cases did 
allow an assessment and comparison of the performance of DIMTEST for datasets with different 
distributional characteristics of item discrimination parameters. The results suggested that, for the 
noncompensatory model, both the magnitude and the variability of ^2 values were related to 
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multidimensionality, with the effect of magnitude dominating over variabihty for determining 
multidimensionality. While for the compensatory model, only the variability of the values 
seemed to reflect the degree of multidimensionality, and the likehhood of rejecting the hypothesis 
of essential unidimensionality depended on the interrelation between the a, and values. This 
finding has some implication for the use of P , proposed by Nandakumar (1991) as a rough index 
of departure from essential unidimensionality. Since this index was developed for the case of 
uncorrelated a, and values, it may not be applicable to datasets with nonzero correlation between 
the a, and values. 

In previous studies, DIMTEST had been found to work sufficiently well with data 
simulated from simple structure tj^e of specification and modeled by compensatory abilities. 

Given that DIMTEST uses a factor analytic procedure to select ATI items, intuitively it should 
work well with this type of latent structure. However, the findings from this study suggest that, 
for datasets that contained items which were all influenced simultaneously by multiple abilities, the 
noncompensatory model seemed to be the better approach for modeling the truly non- 
unidimensional item responses. On the other hand, it might be argued that, the DIMTEST 
procedure lacks power rather than the compensatory model yields more unidimensional-like data. 
To address this issue, a comparative study would be necessary so that the number of dimensions 
for this type of data could be tested using other approaches for assessing unidimensionality. 

With respect to the effects of test length, sample size, interability correlation, and guessing, 
findings across the two models were consistent to those for datasets simulated based on simple 
sfructure. Generally speaking, in the situations where DIMTEST identified multidimensionahty, 
the power increased when test length and sample size increased, and when interability correlation 
decreased. In addition, the power decreased when guessing was present, except when both test 
length and sample size were large (N=40, S=2000). For datasets with various combinations of 
short test length (N=20), small sample size (S=500), high correlation between dimensions (p=.7), 
and nonzero guessing (c=.2), DIMTEST appeared to have less power. 
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This study should foster further research on DIMTEST and other methods for assessing 
unidimensionality using data smiulated from an alternative model and a nonsmiple structure type of 
specification. Although intended to be comprehensive in terms of factors involved, a question 
remained unanswered, which is, are there monotonic relationships between the power of 
DIMTEST and the degree of relative magnitude and/or variability for the two discrimination 
vectors? Future simulation studies with more systematic variations on these factors are needed to 
address this question. 
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Table 1. Rejection Rates (%) for Noncompensatory Model with Mean.i=1.23, Mean^=0.49. SD.,=0.34, SD^=0.1 1, r.„a=-0.29 
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